Goto

Collaborating Authors

 cv task


USB: A Unified Semi-supervised Learning Benchmark for Classification

Neural Information Processing Systems

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.


USB: A Unified Semi-supervised Learning Benchmark for Classification

Neural Information Processing Systems

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning.


LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer

arXiv.org Artificial Intelligence

With the rise of Transformer models in NLP and CV domain, Multi-Head Attention has been proven to be a game-changer. However, its expensive computation poses challenges to the model throughput and efficiency, especially for the long sequence tasks. Exploiting the sparsity in attention has been proven to be an effective way to reduce computation. Nevertheless, prior works do not consider the various distributions among different heads and lack a systematic method to determine the threshold. To address these challenges, we propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE). LATTE employs a headwise threshold-based filter with the low-precision dot product and computation reuse mechanism to reduce the computation of MHA. Moreover, the trainable threshold is introduced to provide a systematic method for adjusting the thresholds and enable end-to-end optimization. Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings with only a minor compromise in performance. Also, the trainable threshold is shown to be essential for the leverage between the performance and the computation. As a result, LATTE filters up to 85.16% keys with only a 0.87% accuracy drop in the CV task and 89.91% keys with a 0.86 perplexity increase in the NLP task.


Allen AI & UW Propose Unified-IO: A High-Performance, Task-Agnostic Model for CV, NLP, and Multi-Modal Tasks

#artificialintelligence

Building a general-purpose unified model that can solve diverse tasks in different modalities while maintaining high performance is a long-standing challenge in the machine learning research community. A conventional approach in this direction is building models with task-specialized heads on top of a shared architectural backbone -- but such models require expert knowledge to design a specialized head for each task, and their lack of parameter-sharing for new tasks limits their transfer-learning capabilities. In the new paper Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks, a research team from the Allen Institute for AI and the University of Washington introduces UNIFIED-IO, a neural model with no task- or modality-specific branches that achieves competitive performance across a wide variety of computer vision (CV), natural language processing (NLP), and multi-modal benchmark tasks without fine-tuning. The researchers set out to build a unified neural architecture that ML practitioners with little or no knowledge of the underlying machinery could use to efficiently and effectively train their models for new NLP and CV tasks. For models to support a variety of modalities (images, language, boxes, binary masks, segmentation, etc.), they must represent all modalities in a shared space.


Open Source Datasets for Computer Vision - KDnuggets

#artificialintelligence

Computer Vision (CV) is one of the most exciting subfields within the Artificial Intelligence (AI) and Machine Learning (ML) domain. It is a major component for many modern AI/ML pipelines, and it's transforming almost every industry, enabling organizations to revolutionize the way machines and business systems work. Academically, CV has been a well-established area of computer science for many decades, and over the years, a lot of research has gone into this field to make it better. However, the use of deep neural networks has recently revolutionized the field and given it new fuel for accelerated growth. In this article, we discuss some of the most popular and effective datasets used in the domain of Deep Learning (DL) to train state-of-the-art ML systems for CV tasks.


Facebook & Inria Propose High-Performance Self-Supervised Technique for CV Tasks

#artificialintelligence

Researchers from Facebook and the French National Institute for Research in Digital Science and Technology (Inria) have developed a new technique for self-supervised training of convolutional networks used for image classification and other computer vision tasks. The proposed method surpasses supervised techniques on most transfer tasks and outperforms previous self-supervised approaches. "Our approach allows researchers to train efficient, high-performance image classification models with no annotations or metadata," the researchers write in a Facebook blog post. "More broadly, we believe that self-supervised learning is key to building more flexible and useful AI." Recent improvements in self-supervised training methods have established them as a serious alternative to traditional supervised training. Self-supervised approaches however are significantly slower to train compared to their supervised counterparts.


Machine Learning Terminology Explained: Top 8 Must-Know Concepts

#artificialintelligence

Getting started with AI? Perhaps you've already got your feet wet in the world of Machine Learning, but still looking to expand your knowledge and cover the subjects you've heard of but didn't quite have time to cover? This Machine Learning Glossary aims to briefly introduce the most important Machine Learning terms - both for the commercially and technically interested. It's not by any means exhaustive, but a good, light read prep before a meeting with an AI director or vendor - or a quick revisit before a job interview! Natural Language Processing (NLP) is a common notion for a variety of machine learning methods that make it possible for the computer to understand and perform operations using human (i.e. Text Classification and Ranking The goal of this task is to predict a class (label) of a document, or rank documents within in a list based on their relevance.